Picture for Zhenyu Zhang

Zhenyu Zhang

Native Audio-Visual Alignment for Generation

Add code
May 28, 2026
Viaarxiv icon

VINS-120K: Ultra High-Resolution Image Editing with A Large-Scale Dataset

Add code
May 22, 2026
Viaarxiv icon

The Last Human-Written Paper: Agent-Native Research Artifacts

Add code
Apr 27, 2026
Viaarxiv icon

OmniFit: Multi-modal 3D Body Fitting via Scale-agnostic Dense Landmark Prediction

Add code
Apr 23, 2026
Viaarxiv icon

MARS-Dragonfly: Agile and Robust Flight Control of Modular Aerial Robot Systems

Add code
Apr 07, 2026
Viaarxiv icon

CLEAR: Unlocking Generative Potential for Degraded Image Understanding in Unified Multimodal Models

Add code
Apr 06, 2026
Viaarxiv icon

MedLoc-R1: Performance-Aware Curriculum Reward Scheduling for GRPO-Based Medical Visual Grounding

Add code
Mar 30, 2026
Viaarxiv icon

Sparse Growing Transformer: Training-Time Sparse Depth Allocation via Progressive Attention Looping

Add code
Mar 25, 2026
Viaarxiv icon

VTAM: Video-Tactile-Action Models for Complex Physical Interaction Beyond VLAs

Add code
Mar 24, 2026
Viaarxiv icon

CARE: Covariance-Aware and Rank-Enhanced Decomposition for Enabling Multi-Head Latent Attention

Add code
Mar 18, 2026
Viaarxiv icon